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Any national cuisine is a sum total of its variety of regional cuisines, which are the cultural and 
historical identifiers of their respective regions. India is home to a number of regional cuisines that 
showcase its culinary diversity. Here, we study recipes from eight different regional cuisines of India 
spanning various geographies and climates. We investigate the phenomenon of food pairing which 
examines compatibility of two ingredients in a recipe in terms of their shared flavor compounds. 

Food pairing was enumerated at the level of cuisine, recipes as well as ingredient pairs by quantifying 
flavor sharing between pairs of ingredients. Our results indicate that each regional cuisine follows 
negative food pairing pattern; more the extent of flavor sharing between two ingredients, lesser their 
co-occurrence in that cuisine. We find that frequency of ingredient usage is central in rendering the 
characteristic food pairing in each of these cuisines. Spice and dairy emerged as the most significant 
ingredient classes responsible for the biased pattern of food pairing. Interestingly while individual 
spices contribute to negative food pairing, dairy products on the other hand tend to deviate food 
pairing towards positive side. Our data analytical study highlighting statistical properties of the 
regional cuisines, brings out their culinary fingerprints that could be used to design algorithms for 
generating novel recipes and recipe recommender systems. It forms a basis for exploring possible 
causal connection between diet and health as well as prospection of therapeutic molecules from food 
ingredients. Our study also provides insights as to how big data can change the way we look at 
food. 

PACS numbers: 89.T5.-k, 82.20.Wt, 87.18.Vf, 87.10.Vg, 89.90.+n 


I. INTRODUCTION 

Cooking is a unique trait humans possess and is be¬ 
lieved to be a major cause of increased brain size m- 
[3]. While cooking encompasses an array of food pro¬ 
cessing techniques [4], cuisine is an organized series of 
food preparation procedures intended to create tasty and 
healthy food. India has a unique blend of culturally and 
climatically diverse regional cuisines. Its culinary history 
dates back to the early Indus valley civilization m- 
Indian dietary practices are deeply rooted in notions of 
disease prevention and promotion of health. 

Food perception involving olfactory and gustatory 
mechanisms is the primary influence for food preferences 
in humans. These preferences are also determined by a 
variety of factors such as culture, climate geography and 
genetics, leading to emergence of regional cuisines @1 la¬ 
in]. Food pairing is the idea that ingredients having sim¬ 
ilar flavor constitution may taste well in a recipe. Chef 
Blumenthal was the first to propose this idea which in 
this study we term as positive food pairing [13]. Studies 
by Ahn et al found that North American, Latin American 
and Southern European recipes follow this food pairing 
pattern where as certain others like North Korean cui¬ 
sine and Eastern European cuisines do not OHS!- Our 
previous study of food pairing in Indian cuisine revealed 
a strong negative food pairing pattern in its recipes m- 
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Knowing that each of the regional cuisines have their 
own identity, the question we seek to answer in this paper 
is whether the negative food pairing pattern in Indian 
cuisine is a consistent trend observed across all of the 
regional cuisines or an averaging effect. Towards answer¬ 
ing this question, we investigated eight geographically 
and culturally prominent regional cuisines viz. Bengali, 
Gujarati, Jain, Maharashtrian, Mughlai, Punjabi, Ra¬ 
jasthani and South Indian. The pattern of food pairing 
was studied at the level of cuisine, recipes and ingredient 
pairs. Such a multi-tiered study of these cuisines pro¬ 
vided a thorough understanding of its characteristics in 
terms of ingredient usage pattern. We further identified 
the features that contribute to food pairing, thereby re¬ 
vealing the role of ingredients and ingredient categories 
in determining food pairing of the regional cuisines. 


Availability of large datasets in the form of cook¬ 
ery blogs and recipe repositories has prompted the use 
of big data analytical techniques in food science and 
has led to the emergence of computational gastronomy. 
This held has made advances through many recent stud¬ 
ies H2 US H2 Q1 which is changing the overall outlook 
of culinary science in recent years. Our study is an off¬ 
shoot of this approach. We use statistical and compu¬ 
tational models to analyse food pairing in the regional 
cuisines. Our study reveals the characteristic signature 
of each Indian regional cuisines by looking at the recipe 
and ingredient level statistics of the cuisine. 
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II. RESULTS AND DISCUSSION 

Details of recipes, ingredients, and their corresponding 
flavor compounds constitute the primary data required 
for study of food pairing in a cuisine. Much of this is 
documented in the form of books and recently through 
online recipe sources. We obtained the Indian cuisine 
recipes data from one of the popular cookery websites 
TarlaDalal.com |19j . The flavor profiles of ingredients 
were compiled using previously published data [T5] and 
through extensive literature survey. Table |T] lists details 
of recipes and ingredients in each of the regional cuisines. 


TABLE I. Statistics of regional cuisines 


Cuisine 

Recipe count 

Ingredient count 

Bengali 

156 

102 

Gujarati 

392 

112 

Jain 

447 

138 

Maharashtrian 

130 

93 

Mughlai 

179 

105 

Punjabi 

1013 

152 

Rajasthani 

126 

78 

South Indian 

474 

114 


Recipes of size > 2 were considered for the purpose of flavor 
analysis. 


The ingredients belonged to following 15 categories: 
spice, vegetable, fruit, plant derivative, nut/seed, ce¬ 
real/crop, dairy, plant, pulse, herb, meat, fish/seafood, 
beverage, animal product, and flower. Category-wise in¬ 
gredient statistics of regional cuisines is provided in|V A| 


A. Statistics of recipe size and ingredient frequency 

We started with investigation of preliminary statistics 
of regional cuisines. All the eight regional cuisines un¬ 
der consideration showed bounded recipe-size distribu¬ 
tion (Figure [l]). While most cuisines followed uni-modal 
distribution, Mughlai cuisine showed a strong bimodal 
distribution and had recipes with large sizes when com¬ 
pared with the rest. This could be an indication of the 
fact that Mughlai is derivative of a royal cuisine. To 
understand the ingredient usage pattern, we ranked in¬ 
gredients according to decreasing usage frequency within 
each cuisine. As shown in Figure [2] all cuisines showed 
strikingly similar ingredient usage profile reflecting the 
pattern of Indian cuisine (Figure |2j inset). While indi¬ 
cating a generic culinary growth mechanism, the distri¬ 
butions also show that certain ingredients are excessively 
used in cuisines depicting their inherent ‘fitness’ or pop¬ 
ularity within the cuisine. 



FIG. 1. Recipe size distributions. Plot of probability of 
finding a recipe of size s in the cuisine. Consistent with other 
cuisines, the distributions are bounded. Mughlai and Punjabi 
cuisines have recipes of large sizes compared to other cuisines. 



FIG. 2. Frequency-Rank distributions. Ingredients 
ranked as per their frequency of use in the cuisine. Higher 
the occurrence, better the rank of the ingredient. All the 
cuisines have similar ingredient distribution profile indicating 
generic culinary growth mechanism. Inset shows the ingredi¬ 
ent frequency-rank distribution for the whole Indian cuisine. 


B. Food pairing hypothesis 

Food pairing hypothesis is a popular notion in culi¬ 
nary science. It asserts that two ingredients sharing com¬ 
mon flavor compounds taste well when used together in 
a recipe. This hypothesis has been confirmed for a few 
cuisines such as North American, Western European and 
Latin American m- In contrast, Korean and Southern 
European cuisines have been shown to deviate from pos¬ 
itive food pairing. Our previous study of food pairing in 
Indian cuisine at the level of cuisine, sub-cuisines, recipes 
and ingredient pairs has shown that it is characterized 
with a strong negative food pairing [16j. We quantify 
food pairing with the help of flavor profiles of ingredi¬ 
ents. Flavor profile represents a set of volatile chemical 
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compounds that render the characteristic taste and smell 
to the ingredient. Starting with the flavor profiles of each 
of the ingredients, average food pairing of a recipe (N^) 
as well as that of the cuisine (N s ) was computed as il¬ 
lustrated in Figure [ 3 J The extent of deviation of N s of 
the cuisine, when compared to that of a ‘random cuisine’ 
measures the bias in food pairing. The higher/lower the 
value of N s from that of its random counterpart the more 
positive/negative it is. 


CUISINE 




X l F ‘ nF jl 


Recipe (R x ) Recipe (R 2 ) 

Recipe (/?,) 

Recipe (R n ) 

A//. JV*2 

N?t 

N*n 


Recipe (/?) 


Number of ingredients (s): 4 




Number of shared flavour compounds between two ingredients [N) 


Computation of flavour sharing (Ng) for a recipe R 


FIG. 3. Schematic for calculation of‘average N a ’ (N a ). 

Illustration of procedure for calculating the average N s for a 
given cuisine. Beginning with an individual recipe, average 
N a of the recipe (N^) was calculated. Averaging IV ^ over all 
the recipes returned N a of the cuisine. 


C. Regional cuisines of India exhibit negative food 
pairing 

We found that all regional cuisines are invariantly char¬ 
acterized by average food pairing lesser than expected by 
chance. This characteristic negative food pairing, how¬ 
ever, varied in its extent across cuisines. Mughlai cui¬ 
sine, for example, displayed the least inclination towards 
negative pairing (A N s = N s — N s = —0.758 

and Z -score of -10.232). Whereas, Maharashtrian cui¬ 
sine showed the most negative food pairing (AiV s = 


Maharashtrian Rand . „ 

N s — N s = -4.523 and Z-score of - 

52.047). Figure |4] depicts the generic food pairing pattern 
observed across regional cuisines of India. We found that 
the negative food pairing is independent of recipe size as 
shown in Figure [5] This indicates that the bias in food 
pairing is not an artefact of averaging over recipes of all 
sizes and is a quintessential feature of all regional cuisines 
of India. Note that, across cuisines, majority of recipes 
are in the size-range of around 3 to 12. Hence the sig¬ 
nificance of food pairing statistics is relevant below the 
recipe size cut-off of ~12. 

We further investigated for possible factors that could 
explain negative food pairing pattern observed in regional 
cuisines. We created randomized controls for each re¬ 
gional cuisine to explore different aspects that may con¬ 
tribute to the bias in food pairing. In the first control, 
frequency of occurrence of each ingredient was preserved 
at the cuisine level (‘Ingredient frequency’). In the sec¬ 
ond control, category composition of each recipe was pre¬ 
served (‘Ingredient category’). A third composite control 
was created by preserving both category composition of 
each recipe as well as frequency of occurrence of ingredi¬ 
ents (‘Category + Frequency’). 

Interestingly, ingredient frequency came out to be a 
critical factor that could explain the observed bias in food 
pairing as reflected in N s (Figure]!]). The pattern of food 
pairing across different size-range of recipes is also consis¬ 
tent with this observation (Figure [5]). On the contrary, 
category composition itself turned out to be irrelevant 
and led to food pairing that was similar to that of a ran¬ 
domized cuisine. Further, the control implementing a 
composite model featuring both the above aspects recre¬ 
ated food pairing observed in regional cuisines. Thus fre¬ 
quency of occurrence of ingredients emerged as the most 
central aspect which is critical for rendering the charac¬ 
teristic food pairing. 


D. Food pairing at recipe level 

Looking into the food pairing at recipe level, we an¬ 
alyzed the nature of distribution of food pairing among 
recipes (N^). Our analysis showed that the negative 
A N s observed for cuisines was not an averaging effect. 
The N^ values tend to follow exponential distribution, 
indicating that number of recipes exponentially decays 
with increasing N^. To address the noise due to small 
size of cuisines, we computed cumulative distribution 
(P(< Nf)) as depicted in Figure [dj The nature of cu¬ 
mulative distribution for an exponential probability dis¬ 
tribution function (P(N^) oc e aN *) would be of the 
following form: 

P(<iV.*) = »+ 1+ fc ~°„. (1) 

We found that all regional cuisines show a strong bias 
towards recipes of low N^ values as observed in Figure]!] 
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FIG. 4. AIV S and its statistical significance. The variation in A N s for regional cuisines and corresponding random 
controls signifying the extent of bias in food pairing. Statistical significance of A N s is shown in terms of Z-score. ‘Regional 
cuisine’ refers to each of the eight cuisines analyzed; ‘Ingredient frequency’ refers to the frequency controlled random cuisine; 
‘Ingredient category’ refers to ingredient category controlling random cuisine; and ‘Category + Frequency’ refers to random 
control preserving both ingredient frequency and category. Among all regional cuisines, Mughlai cuisine showed least negative 
food paring (A N s = —0.758) while Maharashtrian cuisine had most negative food pairing (AN S = —4.523). 


For each regional cuisine, the bias was accentuated in 
comparison to corresponding random cuisines as reflected 
in the exponents shown in |V B[ Once again Mughlai cui¬ 
sine emerged as an outlier, as the nature of its N^ distri¬ 
bution did not indicate a clear distinction from that of its 
random control. Consistent with the observation made 
with N s and A N s statistics (Figure [4] and Figure [H]), we 
found that controlling for frequency of occurrence of in¬ 
gredients reproduces the nature of N^ distribution across 
all regional cuisines (barring the Mughlai cuisine). This 
further highlights the role of ingredient frequency as a key 
factor in specifying food pairing at the level of recipes as 
well. 


E. Food pairing at the level of ingredient pairs 

Beyond the level of cuisine and recipes, the bias in food 
pairing can be studied at the level of ingredient pairs. We 
computed co-occurrence of ingredients in the cuisine for 
increasing value of flavor profile overlap (TV). We found 
that the fraction of pairs of ingredients with a certain 
overlap of flavor profiles (/(A)) followed a power law dis¬ 
tribution f(N) oc N~ J (Figure [ 7 ]). This indicates that 
higher the extent of flavor overlap between a pair of in¬ 


gredients, the lesser is its usage in these cuisines. [V C| lists 
the 7 values for each of the regional cuisines. 

F. Contribution of individual ingredients towards 
food pairing 

For each of the regional cuisines we calculated the con¬ 
tribution of ingredients (xi) towards the food pairing pat¬ 
tern. For an ingredient whose presence in the cuisine 
does not lead to any bias, the value of \i is expected 
to be around zero. With increasing role in biasing food 
pairing towards positive (negative) side, \i is expected 
to be proportionately higher (lower). Figure [8] shows the 
distribution of ingredient contribution (xt) and its fre¬ 
quency of occurrence, for each regional cuisine. Ingre¬ 
dients that make significant contribution towards food 
pairing could be located, in either positive or negative 
side, away from the neutral vertical axis around \i = 0. 
Significantly, spices were consistently present towards the 
negative side, while milk and certain dairy products were 
present on the positive side across cuisines. Prominently 
among the spices, cayenne consistently contributed to the 
negative food pairing of all regional cuisines. Certain in¬ 
gredients appeared to be ambivalent in their contribution 
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(d) Maharashtrian 
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FIG. 5. Variation in average N s and its statistical significance. Change in N a with varying recipe size cut-offs reveals 
the nature of food pairing across the spectrum of recipe sizes. The N s values for regional cuisines were consistently on the 
lower side compared to their random counterparts. Category controlled random cuisine displayed average N s variation close 
to that of the ‘Random control’. Frequency controlled as well as ‘Category + Frequency’ controlled random cuisines, on the 
other hand, displayed average N s variations close to that of the real-world cuisine. 


to food pairing. While cardamom contributed to the pos¬ 
itive food pairing in Gujarati, Mughlai, Rajasthani, and 
South Indian cuisines, it added to negative food pairing 
in Maharashtrian cuisine. Green bell pepper tends to 
contribute to negative food pairing across the cuisines 
except in the case of Rajasthani cuisine. Details of \i 
values of prominent ingredients for each regional cuisine 
are presented in|VD| 


G. Role of ingredient categories in food pairing 

As discussed earlier, the random cuisine where only 
category composition of recipes was conserved, tends to 
have food pairing similar to that of the ‘Random con¬ 
trol’ (Figure [4] and Figure [5]). This raises the question 
whether ingredient category has any role in determining 
food pairing pattern of the cuisine. Towards answering 
this question, we created random cuisines wherein we 
randomized ingredients within one category, while pre¬ 
serving the category and frequency distribution for rest 
of the ingredients. The extent of contribution of an in¬ 


gredient category towards the observed food pairing in 
the cuisine is represented by A N a at . Figure [9] depicts 
significance of ingredient categories towards food pair¬ 
ing of each regional cuisine. Interestingly, the pattern of 
category contributions presents itself as a ‘culinary fin¬ 
gerprint’ of the cuisine. 

The ‘spice’ category was the most significant contrib¬ 
utor to negative food pairing across cuisines with the ex¬ 
ception of Mughlai cuisine. Another category which con¬ 
sistently contributed to negative food pairing was ‘dairy’. 
On the other hand, ‘vegetable’ and ‘fruit’ categories tend 
to bias most cuisines towards positive food pairing. Com¬ 
pared to the above-mentioned categories, ‘nut/seed’, ‘ce¬ 
real/crop’, ‘pulse’ and ‘plant derivative’ did not show any 
consistent trend. ‘Plant’ and ‘herb’ categories, sparsely 
represented in cuisines, tend to tilt the food pairing to¬ 
wards positive side. In Mughlai cuisine all ingredient cat¬ 
egories, except ‘dairy’, tend to contribute towards posi¬ 
tive food pairing. This could be a reflection of the meagre 
negative food pairing observed for the cuisine (Figure [4J. 
Above observations were found to be consistent across 
the spectrum of recipe sizes (Figure [l0]) . 
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(a) Bengali (b) Gujarati (c) Jain (d) Maharashtrian 



(e) Mughlai (f) Punjabi (g) Rajasthani 



(h) South Indian 



FIG. 6. Cumulative probability distribution of N^ 1 values for regional cuisines and their random controls. 

Cumulative distribution of IV,f indicates the probability of finding a recipe having food pairing less than or equal to N^. The 
data of regional cuisines as well as those of their controls were fitted with a sigmoid equation indicating that the P(N^) values 
fall exponentially. The exponent a (Equation [ T]) ref ers to the rate of decay; larger the a more prominent is the negative food 

ljVBj 


pairing in recipes of a cuisine. As evident fromTVB distribution of the controls based on ‘Ingredient Frequency’ as well as 
‘Category + Frequency’ displayed recipe level food pairing similar to real-world cuisines. On the other hand, as also observed 
at the level of cuisine (Figure [4] and Figure [5|, both the ‘Random Control’ as well as ‘Ingredient Category’ control deviate 
significantly. 


III. CONCLUSIONS 

With the help of data analytical techniques we have 
shown that food pairing in major Indian regional cuisines 
follow a consistent trend. We analyzed the reason be¬ 
hind this characteristic pattern and found that spices, 
individually and as a category, play a crucial role in ren¬ 
dering the negative food pairing to the cuisines. The use 
of spices as a part of diet dates back to ancient Indus 
civilization of Indian subcontinent m . They also find 
mention in Ayurvedic texts such as Charaka Samhita and 
Bhaavprakash Nighantu |20fl23| . Trikatu , an Ayurvedic 
formulation prescribed routinely for a variety of diseases, 
is a combination of spices viz., long pepper, black pepper 
and ginger [23]. Historically spices have served several 
purposes such as coloring and flavoring agents, preser¬ 
vatives and additives. They also serve as anti-oxidants, 
anti-inflammatory, chemopreventive, antimutagenic and 
detoxifying agents [53J [25]. One of the strongest hypoth¬ 
esis proposed to explain the use of spices is the antimicro¬ 


bial hypothesis, which suggests that spices are primarily 
used due to their activity against food spoilage bacte¬ 
ria IS [26]. A few of the most antimicrobial spices |27| 
are commonly used in Indian cuisines. Our recent stud¬ 
ies have shown the beneficial role of capsaicin, an ac¬ 
tive component in cayenne which was revealed to be the 
most prominent ingredient in consistently rendering the 
negative food pairing in all regional cuisines [25] ■ The 
importance of spices in Indian regional cuisines is also 
highlighted by the fact these cuisines have many derived 
ingredients (such as garam masala, ginger garlic paste 
etc.) that are spice combinations. The key role of spices 
in rendering characteristic food pairing in Indian cuisines 
and the fact that they are known to be of therapeutic 
potential, provide a basis for exploring possible causal 
connection between diet and health as well as prospec- 
tion of therapeutic molecules from food ingredients. Fla¬ 
vor pairing has been used as a basic principle in algo¬ 
rithm design for both recipe recommendation and novel 
recipe generation, thereby enabling computational sys- 
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FIG. 7. Co-occurrence of ingredients with increasing extent of flavor profile overlap. Fraction of ingredient pair 
occurrence (/(IV)) with a certain extent of flavor profile overlap (TV) was computed to assess the nature of food pairing at the 
level of ingredient pairs. Generically across the cuisines it was observed that, the occurrences of ingredient pairs dropped as a 
power law with increasing extent of flavor profile sharing. This further ascertained negative food pairing pattern in regional 
cuisines, beyond the coarse-grained levels of cuisine and recipes. 


terns to enter the creative domain of cooking and suggest¬ 
ing recipes mm- In such algorithms, candidate recipes 
are generated based on existing domain knowledge and 
flavor pairing plays a crucial role while selecting the best 
among these candidates [18] . 

IV. MATERIALS AND METHODS 
A. Data collection and curation 

The data of regional cuisines were obtained from 
one of the leading cookery websites of Indian cui¬ 
sine, tarladalal.com (December 2014). Among var¬ 
ious online resources available for Indian cuisine, 
TarlaDalal )19j (http://www.tarladalal.com) was 
found to be the best in terms of authentic recipes, 
cuisine annotations and coverage across major re¬ 
gional cuisines. The website had 3330 recipes from 
8 Indian cuisines. Among others online sources: 
Sanjeev Kapoor (http://www.sanjeevkapoor.com) 
had 3399 recipes from 23 Indian cuisines; NDTV 
Cooks (http://cooks.ndtv.com) had 667 Indian 
recipes across 15 cuisines; Manjulas Kitchen 
(http://www.manjulaskitchen.com) was restricted 


to 730 Indian vegetarian recipes across 19 food cate¬ 
gories; Recipes Indian (http://www.recipesindian.com) 
had 891 recipes from around 16 food categories; All 
Recipes (http://www.allrecipes.com) had only 449 
recipes from 6 food categories. In comparison to these 
sources, Tarladalal.com was identified as a best recipe 
source of Indian cuisine. 

The data of 3330 recipes and 588 ingredients were 
curated for redundancy in names and to drop recipes 
with only one ingredient. These ingredients belonged 
to 17 categories. Ingredients of ‘snack’ and ‘additive’ 
categories, for which no flavor compounds could be de¬ 
termined, were removed. The ingredients were further 
aliased to 339 source ingredients out of which we could 
determine flavor profiles for 194 of them. Aliasing in¬ 
volves mapping ingredients to their source ingredient. 
For example ‘chopped potato’ and ‘mashed potato’ were 
aliased to ‘potato’. The final data comprised of 2543 
recipes and 194 ingredients belonging to 15 categories. 
The statistics of regional cuisines, their recipes and in¬ 
gredient counts is provided in Table [1] 

The data of flavor compounds were obtained from 
Ahn et. al. [15], Fenaroli’s Handbook of Flavor Com¬ 
pounds [25] and extensive literature search. All the flavor 
profiles were cross checked with those in 6th edition (lat- 
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FIG. 8. Contribution of ingredients ( \i ) towards flavor pairing. For all eight regional cuisines we calculated the \i value 
of ingredients that indicates their contribution to flavor pairing pattern of the cuisine and plotted them against their frequency 
of appearance. Size of circles are proportional to frequency of ingredients. Across cuisines, prominent negative contributors 
largely comprised of spices, whereas a few dairy products consistently appeared on the positive side. 


est) of Fenarolis Handbook of Flavor Compounds [29] for 
consistency of names. Chemical Abstract Service num¬ 
bers were used as unique identifiers of flavor molecules. 


B. Flavor sharing 

Flavor sharing was computed for each pair of ingredi¬ 
ents that co-occur in recipes in terms of number of shared 
compounds N = |Fj D Fj\. Further, the average number 
of shared compounds in a recipe N^ having s ingredients 
was calculated (Equation [2]). 

«" = 00 E i« n ^i ( 2 ) 

where iq represents the flavor profile of ingredient i and 
R represents a recipe. 

For a cuisine with Nr recipes, we then calculated the 
average flavor sharing of the cuisine ). 

Figure [3] illustrates this procedure graphically. We com¬ 
pared average N s of the cuisine with that of correspond¬ 
ing randomized cuisine (Figure [4]) by calculating A N s 


(= jy° msme _ N^ and ^ where cuisine and Rand indicate 
the regional cuisine and corresponding ‘random cuisine’ 
respectively. 

A total of four random controls were created viz. ‘Ran¬ 
dom control’, ‘Ingredient frequency’, ‘Ingredient cate¬ 
gory’ and ‘Category + Frequency’. While in all random 
cuisines recipe size distribution of the original cuisine was 
preserved, ‘Random control’ implemented uniform selec¬ 
tion of ingredients (1 set of 10,000 recipes for each re¬ 
gional cuisine); ‘Ingredient frequency’ control was created 
while maintaining the ingredient usage frequency distri¬ 
bution (1 set of 10,000 recipes for each regional cuisine); 
‘Ingredient category’ control was created by randomizing 
ingredient usage in recipes with ingredients belonging to 
same categories, thus maintaining the category composi¬ 
tion of recipes (8 sets of recipes for a total of > 10,000 
recipes for each regional cuisine); and ‘Category + Fre¬ 
quency’ control preserved both the ingredient categories 
in recipes as well as frequency of overall ingredient usage 
within the cuisine (8 sets of recipes for a total of > 10,000 
recipes for each regional cuisine). 

The statistical significance of N s and A N s was mea- 
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FIG. 9. Contribution of individual categories (AN^ at ) towards food pairing bias and its statistical significance. 

Randomizing ingredients within a certain category provides an insight into their contribution towards bias in food pairing. 
Spice and dairy category showed up as prominent categories contributing to the negative food pairing of regional cuisines. 


sured with corresponding Z-scores given by 

s^ycuisine -r^yRand^ 
" ’s 1 v s ) 




Rand 


®Rand 


Xi values reflect the extent of an ingredient’s contri¬ 
bution towards positive or negative food pairing of the 
cuisine. 


where N Ran d and a Ran d represent the number of recipes 
in randomized cuisine and standard deviation of val¬ 
ues for randomized cuisine respectively. 


C. Ingredient contribution 

For every regional cuisine, the contribution () of each 
ingredient i was calculated using Equation |4j 


Xi = 



E 

iefl 


2 

n(n — 1) 


E i F * nF ji 


( Vi Sjec/jl^nFjl 

\N R (n) Zjecfj 



Here, /,; is the frequency of occurrence of ingredient i. 


D. Uniqueness of ingredient category 

Despite significant flavor sharing within each category 
of ingredients, the uniqueness of each category, by virtue 
of combination of its ingredients with other ingredients, 
was enumerated by intra-category randomization. The 
average food pairing of such cuisine, randomized for a 
category, was compared with that of the original cuisine. 
Such category-randomized cuisines were created only for 
major categories (having 5 or more ingredients) within 
each regional cuisine. The deviation in N s , that reflects 
the relevance of unique placements of ingredients of cat, 
was calculated using Equation [5] 

ANg 1 = N s - N s ,Vs > 2 (5) 

Here, cat stands for an ingredient category and s repre¬ 
sents recipe size. The statistical significance was again 
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FIG. 10. Variation in category contribution and its statistical significance. Across the spectrum of recipe sizes, we 
observed broadly consistent trend of contribution of individual categories towards food pairing bias. 


calculated using Z-score. 


V. SUPPORTING INFORMATION 

A. SI Table 

Distribution of ingredients across categories. 

Number of ingredients in each category for all regional 
cuisines. 


B. S2 Table 

Exponents (a) of Sigmoid fits for P(N^) vs 
distribution. Exponents (a) for regional cuisines and 
their random controls. 


C. S3 Table 

Power law exponents (7) for /(TV) vs N distri¬ 
bution. Power law exponents ( 7 ) of all regional cuisines. 

D. S4 Table 

Ingredients contributing significantly to food 
pairing. Details of top 10 ingredients contributing to 
positive and negative food pairing in each of the regional 
cuisines. 
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Supporting Information: Analysis of food pairing in regional cuisines of India 


I. SUPPORTING TABLES 

A. SI Table 


TABLE I. Distribution of ingredients across categories. Number of ingredients in each category for all regional cuisines. 


Ingredient Category 

Bengali 

Gujarati 

Jain 

Maharashtrian 

Mughlai 

Punjabi 

Rajasthani 

South Indian 

spice 

25 

23 

26 

25 

24 

33 

21 

25 

vegetable 

14 

23 

29 

14 

15 

29 

16 

23 

fruit 

13 

19 

25 

9 

16 

22 

5 

14 

plant derivative 

8 

7 

11 

7 

8 

13 

4 

6 

nut/seed 

12 

12 

12 

11 

11 

13 

8 

10 

cereal/crop 

6 

10 

11 

6 

9 

12 

7 

9 

dairy 

7 

6 

8 

6 

7 

10 

5 

7 

plant 

2 

3 

3 

3 

4 

5 

4 

5 

pulse 

4 

6 

5 

4 

5 

6 

5 

6 

herb 

2 

2 

5 

3 

3 

4 

2 

3 

meat 

3 

0 

0 

2 

0 

1 

0 

0 

beverage 

1 

0 

1 

1 

0 

1 

0 

0 

fish/seafood 

2 

0 

0 

0 

0 

0 

0 

2 

animal product 

2 

0 

1 

1 

2 

2 

0 

2 

flower 

1 

1 

1 

1 

1 

1 

1 

1 

additive 

0 

0 

0 

0 

0 

0 

0 

1 


B. S2 Table 


TABLE II. Exponents (a) of sigmoid fits for P(N^) vs lV,f distribution. Exponents (a) for regional cuisines and their 
random controls. 


Cuisine 

Original 

Random control 

a Values 

Ingredient frequency | Ingredient category 

Category + Frequency 

Bengali 

0.255525 

0.181436 

0.255149 

0.190506 

0.26209 

Gujarati 

0.405862 

0.187475 

0.365109 

0.207978 

0.37633 

Jain 

0.226656 

0.155991 

0.235283 

0.138507 

0.228731 

Maharashtrian 

0.282265 

0.158809 

0.259422 

0.141178 

0.269226 

Mughlai 

0.184891 

0.173672 

0.202563 

0.143178 

0.194965 

Punjabi 

0.207118 

0.150068 

0.207771 

0.120212 

0.215736 

Rajasthani 

0.315478 

0.223507 

0.35912 

0.209513 

0.351726 

South Indian 

0.300892 

0.189509 

0.280907 

0.213137 

0.290387 
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C. S3 Table 


TABLE III. Power law exponent (7) for f(N) v/s N distribution. Power law exponent (7) for all regional cuisines. 


Cuisine 

|7 Value 

Bengali 

1.71906 

Gujarati 

2.11136 

Jain 

1.77156 

Maharashtrian 

1.6974 

Mughlai 

1.47354 

Punjabi 

1.55844 

Rajasthani 

2.62489 

South Indian 

1.948 


D. S4 Table 


TABLE IV: Ingredients contributing significantly to food pair¬ 
ing. Details of top 10 ingredients contributing to positive and negative 
food pairing in each of the regional cuisines. 


Bengali 

Ingredients contributing to 
negative food pairing 

X value 

Frequency of 
occurrence 

Ingredients contributing to 
positive food pairing 

X value 

Frequency of 

occurrence 

coriander 

-0.24319 

40 

milk 

0.84165 

31 

ginger garlic paste 

-0.21437 

16 

cottage cheese 

0.38636 

11 

garam masala 

-0.20126 

14 

orange 

0.21789 

4 

mango 

-0.19701 

13 

buttermilk 

0.17259 

25 

cayenne 

-0.13469 

65 

coconut 

0.13006 

12 

tomato 

-0.11413 

14 

rose 

0.12178 

5 

tamarind 

-0.11053 

9 

cocoa 

0.08218 

5 

green bell pepper 

-0.10233 

26 

strawberry 

0.05512 

2 

cumin 

-0.06875 

36 

cream 

0.05368 

5 

mung bean 

-0.06702 

4 

saffron 

0.05329 

14 

Gujarati 

Ingredients contributing to 
negative food pairing 

X value 

Frequency of 

occurrence 

Ingredients contributing to 
positive food pairing 

X value 

Frequency of 

occurrence 

green bell pepper 

-0.29066 

169 

cardamom 

0.17035 

43 

cayenne 

-0.19164 

145 

milk 

0.158002 

34 

mung bean 

-0.09783 

37 

mango 

0.15628 

20 

coriander 

-0.05721 

45 

lemon 

0.11942 

31 

garam masala 

-0.05695 

26 

strawberry 

0.07485 

2 

black pepper 

-0.05281 

33 

chaat masala 

0.06775 

4 

asafoetida 

-0.04863 

169 

apple 

0.06058 

2 

coriander cumin seeds powder 

-0.04469 

26 

mint 

0.05999 

11 

sesame seed 

-0.04148 

62 

apricot 

0.05948 

1 

Turmeric 

-0.03435 

157 

cottage cheese 

0.05743 

4 

Jain 

Ingredients contributing to 
negative food pairing 

X value 

Frequency of 
occurrence 

Ingredients contributing to 
positive food pairing 

X value 

Frequency of 
occurrence 

cayenne 

-0.18622 

152 

butter 

1.22722 

68 

garam masala 

-0.14199 

28 

milk 

0.85545 

62 

mango 

-0.11421 

24 

bread 

0.26881 

25 

black bean 

-0.08291 

33 

corn 

0.26018 

29 

coriander 

-0.06855 

47 

cocoa 

0.14714 

3 

tamarind 

-0.06793 

17 

cream 

0.11764 

37 

black pepper 

-0.06234 

55 

peanut butter 

0.09925 

4 
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green bell pepper 

-0.06095 

112 

grape 

0.09078 

4 

ginger 

-0.06059 

17 

cheese 

0.08762 

11 

chaat masala 

-0.05613 

14 

strawberry 

0.08254 

4 

Maharashtrian 

Ingredients contributing to 
negative food pairing 

X value 

Frequency of 
occurrence 

Ingredients contributing to 
positive food pairing 

X value 

Frequency of 
occurrence 

cayenne 

-0.20961 

71 

strawberry 

0.18767 

i 

green bell pepper 

-0.16631 

27 

apricot 

0.17937 

i 

cardamom 

-0.13171 

28 

milk 

0.14751 

n 

peanut 

-0.11527 

10 

butter 

0.09349 

3 

tamarind 

-0.11284 

12 

cheese 

0.08038 

1 

tomato 

-0.10687 

8 

coconut 

0.05239 

22 

black bean 

-0.09923 

6 

sesame seed 

0.04636 

6 

black pepper 

-0.09723 

16 

cream 

0.04274 

2 

cinnamon 

-0.08889 

21 

cocoa 

0.04255 

1 

coriander 

-0.08271 

30 

rice 

0.03092 

11 

Mughlai 

Ingredients contributing to 
negative food pairing 

X value 

Frequency of 

occurrence 

Ingredients contributing to 
positive food pairing 

X value 

Frequency of 

occurrence 

ginger 

-0.22264 

20 

milk 

0.95554 

71 

garam masala 

-0.22203 

38 

rice 

0.46744 

9 

clove 

-0.1727 

42 

bread 

0.16189 

12 

cinnamon 

-0.15605 

33 

grape 

0.16132 

3 

tomato 

-0.13042 

21 

mango 

0.14838 

11 

ginger garlic paste 

-0.10488 

22 

lemon 

0.14672 

8 

green bell pepper 

-0.10483 

33 

chaat masala 

0.13532 

13 

cayenne 

-0.09472 

70 

honey 

0.12645 

3 

coriander 

-0.07582 

38 

cream 

0.10899 

38 

onion 

-0.0696 

29 

soybean 

0.08769 

4 

Punjabi 

Ingredients contributing to 
negative food pairing 

X value 

Frequency of 
occurrence 

Ingredients contributing to 
positive food pairing 

X value 

Frequency of 

occurrence 

garam masala 

-0.18891 

251 

milk 

0.16846 

137 

green bell pepper 

-0.14559 

301 

bread 

0.12552 

60 

cayenne 

-0.1208 

496 

butter 

0.10934 

87 

tomato 

-0.10311 

137 

cheese 

0.09834 

7 

mango 

-0.10147 

120 

corn 

0.05484 

34 

ginger garlic paste 

-0.09551 

110 

lemon 

0.0488 

80 

ginger 

-0.08621 

82 

cottage cheese 

0.03844 

128 

coriander 

-0.08364 

243 

grape 

0.03832 

4 

cinnamon 

-0.06514 

84 

honey 

0.03591 

11 

clove 

-0.05827 

86 

olive 

0.03388 

16 

Rajasthani 

Ingredients contributing to 
negative food pairing 

X value 

Frequency of 

occurrence 

Ingredients contributing to 
positive food pairing 

X value 

Frequency of 

occurrence 

garam masala 

-0.13817 

15 

ginger 

0.21659 

3 

coriander 

-0.0901 

35 

mango 

0.15163 

21 

clove 

-0.07852 

16 

milk 

0.14564 

21 

cumin 

-0.07138 

55 

corn 

0.09148 

2 

cinnamon 

-0.05325 

9 

tamarind 

0.07795 

4 

coriander cumin seeds powder 

-0.04782 

4 

cardamom 

0.03735 

31 

asafoetida 

-0.03663 

40 

butter 

0.03672 

2 

cayenne 

-0.03646 

80 

lemon 

0.02806 

3 

potato 

-0.03488 

3 

bread 

0.02767 

2 

black pepper 

-0.03262 

9 

green bell pepper 

0.02621 

33 

South Indian 

Ingredients contributing to 
negative food pairing 

X value 

Frequency of 
occurrence 

Ingredients contributing to 
positive food pairing 

X value 

Frequency of 
occurrence 
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tamarind 

-0.13638 

87 

rice 

0.43068 

119 

tomato 

-0.11714 

51 

garam masala 

0.25363 

24 

green bell pepper 

-0.11087 

144 

butter 

0.19469 

16 

cayenne 

-0.09829 

238 

black bean 

0.1833 

150 

coriander 

-0.06636 

73 

coconut 

0.17749 

68 

curry leaf 

-0.05268 

196 

mung bean 

0.13281 

34 

peanut 

-0.05027 

16 

milk 

0.13233 

26 

ginger 

-0.04228 

24 

cardamom 

0.06319 

46 

lemon 

-0.03363 

20 

soybean 

0.04396 

8 

cumin 

-0.03177 

135 

onion 

0.0302 

72 




